Running on the grid

Running on the grid

Links: Linux Computing at Bowdoin College | Bowdoin Computing grid

The Bowdoin Computing Grid ("The Grid") is a group of Linux servers which appears as one multiprocessor that can run compute-intensive jobs. Anyone with a Bowdoin login can submit jobs to the queue.

There are multiple queues, some public and some are private.

The queues starting with "cs" are private queues which connect to servers bought with my NSF grant for processing big data.

Please read the links above on how to submit jobs to the grid, and how to control the job.

  1. Before you try the grid, test your code and make sure it works. Test it on the small grids test1.asc and test2.asc. It should run to completion and compute the size of viewshed for every point in the grid, in turn. Ideally the output would be a grid file. Printing to stdout also works.
  2. Archive your code:
    tar -cvf mycode.tar Code
    
    Make sure it contains a copy of set1.asc.
  3. Copy your code to dover:
    scp mycode.tar ltoma@dover.bowdoin.edu:~/
    
    This will copy mycode.tar in your home directory on dover
  4. Login on dover:
    ssh ltoma@dover.bowdoin.edu
    
    and extract your code:
    tar -xvf  mycode.tar 
    
  5. Create a script to run your code on the grid. Let's say you decide to call it grid.sh. You can edit it in emacs or vim. Here is how grid script might look like:
    #!/bin/bash
    #$ -cwd
    #$ -j y
    #$ -S /bin/bash
    #$ -M ltoma@bowdoin.edu -m be
    
    /people/faculty4/cs/ltoma/gridtest/timertest > /people/faculty4/cs/ltoma/gridtest/output.txt 
    
    You'll want to change the line in the header to include your email address (don't leave mine!! I'll get emails every time your jobs get submitted and completed).

    This script submits an executable named "timertest" by specifying its complete path, and redirecting its output in a file foo.TXT (also specifying the path).

    You'll want to change this to refer to your file and your path. To find out your path run:

    pwd
    

    At the end make sure you change the permissions of grid.sh so that it is executable.

    chmod +x grid.sh 
    
  6. Login to the machine that runs the grid engine:
    ssh moosehead
      
    and submit the job
    qsub -q   grid.sh 
    
    To specify the queue where you want your job executed, you can chose from any queue starting with cs:
    cs1g@moosecs3             
    cs1g@moosecs4             
    cs4g@moosecs1             
    cs4g@moosecs10          
    cs4g@moosecs2             
    cs4g@moosecs9             
    cs512m@moosecs5       
    cs512m@moosecs6       
    cs512m@moosecs7       
    cs512m@moosecs8       
    
    So your command might look like:
    qsub -q cs1g@moosecs3 grid.sh
    
    The machines have the same processor but different amounts of RAM. Since set1.asc takes less than 1M of memory, any of the queues will be fine.

    Since we expect a multiviewshed on set1 to take hours, a smart way to chose your queue is to first check which queues are empty. Run

    qstat -f
    
    to see what queues are there and what's running. Once my job was scheduled on cs1g@moosecs3, you might see something like:
    cs1g@moosecs3                  BI    0/1/1          0.03     lx24-amd64    
      31258 0.55500 grid1.sh   ltoma        r     11/19/2013 15:26:22     1        
    ---------------------------------------------------------------------------------
    cs1g@moosecs4                  BI    0/0/1          0.02     lx24-amd64    
    ---------------------------------------------------------------------------------
    cs4g@moosecs1                  BI    0/0/1          0.00     lx24-amd64    
    ---------------------------------------------------------------------------------
    cs4g@moosecs10                 BI    0/0/1          0.01     lx24-amd64    
    ---------------------------------------------------------------------------------
    cs4g@moosecs2                  BI    0/0/1          0.01     lx24-amd64    
    ---------------------------------------------------------------------------------
    cs4g@moosecs9                  BI    0/0/1          0.00     lx24-amd64    
    ---------------------------------------------------------------------------------
    cs512m@moosecs5                BI    0/0/1          0.00     lx24-amd64    
    ---------------------------------------------------------------------------------
    cs512m@moosecs6                BI    0/0/1          0.03     lx24-amd64    
    ---------------------------------------------------------------------------------
    cs512m@moosecs7                BI    0/0/1          0.00     lx24-amd64    
    ---------------------------------------------------------------------------------
    cs512m@moosecs8                BI    0/0/1          0.00     lx24-amd64    
    ---------------------------------------------------------------------------------
    gpu@moosegpu1                  BI    0/0/1          0.00     lx24-amd64    
    ---------------------------------------------------------------------------------
    gpu@moosegpu2                  BI    0/0/1          1.00     lx24-amd64    
    ---------------------------------------------------------------------------------
    gpu@moosegpu3                  BI    0/0/1          1.00     lx24-amd64    
    
  7. Log out of moosehead. You should receive a confirmation (an email) that you job was received. Another one when the job is started, and another one when the job is finished.

    Moosehead is an old machine that's meant solely for running the grid engine, so please do not ue it for anything else other than submitting jobs (no compiling, no emacs, etc).

    The message that notifies you that the job is completed will also contain information on the timing.

    You can submit many jobs at once, and then logout and go home. The Grid system will queue them up and run them one after the other, in the order received. There is no need to remain logged in while the jobs are running. You will receive an e-mail notification when each job starts, and another when the job finishes, along with statistics about the job.

    Once the jobs are done, you can then login to any of the Linux machines and find the results in the output files located in the directory that you ran the scripts.


Last modified: Wed Nov 20 10:51:00 EST 2013